library(quantmod)
## Loading required package: xts
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Loading required package: TTR
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
library(tidyquant)
## Loading required package: lubridate
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
## Loading required package: PerformanceAnalytics
## 
## Attaching package: 'PerformanceAnalytics'
## The following object is masked from 'package:graphics':
## 
##     legend
library(tidyverse)
## ── Attaching packages
## ───────────────────────────────────────
## tidyverse 1.3.2 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.6     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.4     ✓ stringr 1.4.0
## ✓ readr   2.1.2     ✓ forcats 0.5.1
## Warning: package 'readr' was built under R version 4.0.5
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x lubridate::as.difftime() masks base::as.difftime()
## x lubridate::date()        masks base::date()
## x dplyr::filter()          masks stats::filter()
## x dplyr::first()           masks xts::first()
## x lubridate::intersect()   masks base::intersect()
## x dplyr::lag()             masks stats::lag()
## x dplyr::last()            masks xts::last()
## x lubridate::setdiff()     masks base::setdiff()
## x lubridate::union()       masks base::union()
library(plotly)
## 
## Attaching package: 'plotly'
## 
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following object is masked from 'package:graphics':
## 
##     layout
library(imputeTS)
## 
## Attaching package: 'imputeTS'
## 
## The following object is masked from 'package:zoo':
## 
##     na.locf
library(lubridate)
# Problem 1
  1. Major indexes in the United States are the Dow Jones, the S&P 500 and the Nasdaq Composite. An index is a method by which to measure the performance of a group of assets. You can think of an index as a basket of securities which helps you understand how a certain sector is performing.

options("getSymbols.warning4.0"=FALSE)
options("getSymbols.yahoo.warning"=FALSE)

tickers = c("WMT","XOM","MMM","IBM","JPM","AAPL","KO","MCD","NKE","AXP" )
for (i in tickers){
  getSymbols(i,
             from = "2020-01-01",
             to = "2022-12-31")}

x <- list(
  title = "date"
)
y <- list(
  title = "value"
)

stock <- data.frame(AAPL$AAPL.Adjusted,
                  KO$KO.Adjusted,
                  MCD$MCD.Adjusted,
                  NKE$NKE.Adjusted,
                  AXP$AXP.Adjusted,
                    WMT$WMT.Adjusted,
                    XOM$XOM.Adjusted,
                    MMM$MMM.Adjusted,
                    IBM$IBM.Adjusted,
                    JPM$JPM.Adjusted)

stock <- data.frame(stock,rownames(stock))


colnames(stock) <- append(tickers,'Dates')


stock$date<-as.Date(stock$Dates,"%Y-%m-%d")
g4<- ggplot(stock, aes(x=date)) +
  geom_line(aes(y=WMT, colour="red"))+
  geom_line(aes(y=XOM, colour="red"))+
  geom_line(aes(y=MMM, colour="red"))+
  geom_line(aes(y=IBM, colour="red"))+
  geom_line(aes(y=JPM, colour="red"))+
  geom_line(aes(y=AAPL, colour="blue"))+
  geom_line(aes(y=KO, colour="blue"))+
  geom_line(aes(y=MCD, colour="blue"))+
  geom_line(aes(y=NKE, colour="blue"))+
  geom_line(aes(y=AXP, colour="blue"))+
  labs(
    title = "Stock Prices for seleted companies in the S&P500 and Dow Jones",
    subtitle = "From 2020-2022",
    x = "Date",
    y = "Adjusted Closing Prices")+
  guides(colour=guide_legend(title="S&P500 (Red) vs Dow Jones (Blue)"))


ggplotly(g4) %>%
  layout(hovermode = "x")
  1. All the stocks in both indexes dropped in March 2020, which was when the pandemic started. From this small sample size (10 stocks), the Dow are more spread out, while the S&P500 is more concentrated. 2 months after Covid hit, with the quantitative easing from the FED, combine with the stimus check from the government, people put lots of money in the stock market, drove the prices of stock higher. Within the S&P500, two companies 3M (MMM) and Exxon Mobil (XON) continue to increase from 2020 to 2023, while three other companies JP Morgan (JPM), Walmart (WMT) and IBM (IBM) decreased began in 2022. The reason for it is 3M and Exxon Mobil produce neccesary goods. People need to use and buy their items every day. With the chance of recession, the FED raises interest rates, businesses find it harder to do business. Employees are let go, people lose income. Thus, besides 3 companies 3M, Exxon Mobil and Coca-Cola, the other 7 stocks decline beginning in 2022. S&P500 1 year down 8.15%, while Dow Jones 1 year down 2.15%.

  2. These two indexes represent the stock market. One can look at these indeces and tell whether the stock market or the economy is doing good or bad. People can either earn and lose money from the stock market. However, one thing always true is that in the long run, let’s say 10 plus year, the stock market always increase, with the average of 8% per annum. However, in the short run, there is lots of volatility and the chance of winning and losing are equal. Growth and tech stocks increase the most during expansion stage, when interest rate is low. They also experienced huge loss when the economy’s health turn south.

# Problem 2
  1. Stationary time series means its property of a process do not change over time. The mean and the variance stay constant over the considered period. Thus, any series with trend or seasonaality are not stationary.

  2. Select data only from station USC00189035, which located at Soldier Home Cemetery in Washington D.C. This station has 1826 observations, measured daily from 1973 to 1977.

climate <- read.csv('/Users/taikhanghao/Desktop/spring 23/time series/climateDC.csv')
soldier_home <- filter(climate, climate$STATION == "USC00189035")

Convert date from character type to date format

soldier_home$DATE<-as.Date(soldier_home$DATE,format = "%m/%d/%y")

Since there are some missing values, we use moving average method to fill in the missing data.

soldier_home <- na_ma(soldier_home, k = 4, weighting = "exponential")

Here, I plot the temperature and the precipitation in Soldier Home from 1973 to 1977. The objective is how temperature and precipitation relate to each other.

fig <- plot_ly()
# Add traces
fig <- fig %>% add_trace(x = soldier_home$DATE, y = soldier_home$TOBS, name = "Temperature", mode = "lines", type = "scatter")

ay <- list(
  tickfont = list(color = "red"),
  overlaying = "y",
  side = "right",
  title = "Millimeters")

fig <- fig %>% add_trace(x = soldier_home$DATE, y = soldier_home$PRCP, name = "Precipitation", yaxis = "y2", mode = "lines", type = "scatter")

# Set figure title, x and y-axes titles
fig <- fig %>% layout(
  title = "Temperature and Precipitation in Soldier Home from 1973 to 1977", yaxis2 = ay,
  xaxis = list(title="xaxis title "),
  yaxis = list(title="Farenheit")
)%>%
  layout(plot_bgcolor='#e5ecf6',
          xaxis = list(
            zerolinecolor = '#ffff',
            zerolinewidth = 2,
            gridcolor = 'ffff'),
          yaxis = list(
            zerolinecolor = '#ffff',
            zerolinewidth = 2,
            gridcolor = 'ffff')
          )

fig
  1. Looking at the graph, we can see that there is a seasonal behavior (we will perform decomposition to confirm this). The highest temperature recorded during this period is 97 F on July 1977. The lowest temperature is 19 F, recorded on Jan 1976 and Jan 1977. For precipitation, Sep 1975 had the highest raining amount. Temperature is high during summer months and low in winter months, which is obvious. Every year, September and October usually have the highest amount of rain within a year. The explanation for that is heat in summer months evaporate, causing rain afterward. We can see the lag. High temperature follows by high amount of rain. Vice versa, months with low temperature usually have lower precipitation.

  2. Decomposion

temp <- data.frame(soldier_home$TOBS)
ts_temp <- ts(temp, start = 1973,end = 1977,frequency = 365)
stl_temp = stl(ts_temp, "periodic")

seasonal_stl_temp   <- stl_temp$time.series[,1]
trend_stl_temp    <- stl_temp$time.series[,2]
random_stl_temp  <- stl_temp$time.series[,3]
 
# Decomposition
autoplot(stl_temp) 

The trend is relatively constant from 1973 to 1976. However, it decreased significantly between 1976 to 1977. We can see clearly the seasonality, confirm our finding above. For the random factor, there’s lots of noise and volatile. As expected, the weather is unpredictable. This year weather does not indicate what will happen next year.

  1. Washington D.C does have “good” weather. With temperature ranging from 19 to 97, it is not too cold (compared to northern states) in winter and not too hot in summer. During this five year period, the weather stayed constant, did not change much in the temperature and precipitation.